BANDIT PROBLEMS WITH LEVY PAYOFF PROCESSES 



ASAF COHEN AND EILON SOLAN 



Abstract. We study two-armed Levy bandits in continuous-time, which have 
one safe arm that yields a constant payoff s, and one risky arm that can be 
either of type High or Low; both types yield stochastic payoffs generated by 
a Levy process. The expectation of the Levy process when the arm is High is 
greater than s, and lower than s if the arm is Low. 

The decision maker (DM) has to choose, at any given time t, the fraction of 
resource to be allocated to each arm over the time interval [t, t + dt). We show 
that under proper conditions on the Levy processes, there is a unique optimal 
strategy, which is a cut-off strategy, and we provide an explicit formula for the 
cut-off and the optimal payoff, as a function of the data of the problem. We 
also examine the case where the DM has incorrect prior over the type of the 
risky arm, and we calculate the expected payoff gained by a DM who plays 
the optimal strategy that corresponds to the incorrect prior. 

In addition, we study two applications of the results: (a) we show how to 
price information in two-armed Levy bandit problem, and (b) we investigate 
who fares better in two-armed bandit problems: an optimist who assigns to 
High a probability higher than the true probability, or a pessimist who assigns 
to High a probability lower than the true probability. 



1. Introduction 

Consider a firm that has to determine, on an ongoing basis, how much to invest 
in the research of new technologies for its next line of products. The firm faces 
a tradeoff between exploration and exploitation: on the one hand, it can adopt 
the technology that seems most successful according to the research conducted 
so far, thereby exploiting its investment in research, but on the other hand, it 
could continue investing in various technologies, in the hope of finding an even 
better technology for its products. If the firm decides to stop investing in a given 
technology, then no information will be obtained on that technology, so even if it 
is actually better than the finally adopted technology, it will never be adopted. 

A similar tradeoff between exploration and exploitation arises, e.g., in the market 
of venture capital funds, where each fund has to decide in which start-up companies 
to invest, and in clinical trials, where pharmaceutical companies have to decide 
which new drugs or treatments to explore. 

To concentrate on the trade off between exploration and exploitation, one as- 
sumes that there are no exogenous factors that affect the firm's decision (such as 
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new technologies or drugs that are introduced by competitors). The optimization 
problem that the firm faces has been modeled in the literature as a multi-arm ban- 
dit problem (see, e.g., Rothschild (1974), Bergemann and Valimaki (2006), Keller, 
Rady and Cripps (2005), Besanko and Wu (2008), Klein and Rady (2008), Moscarini 
and Squintani (2004), Roberts and Weitzman (1981), Weitzman (1979)): a decision 
maker (DM) has finitely many actions, called arms, each one yields a payoff with an 
unknown distribution, that is taken from a finite set of distributions. Each time the 
DM chooses an arm, he obtains a payoff, and improves his information regarding 
the correct payoff distribution of the arm he has just chosen. 

Gittins and Jones (1979) proved that, in discrete-time, the optimal strategy of 
the DM has a particularly simple form: at every period the DM calculates a real 
number, an index, for each arm, based on past observations of that arm, and he 
chooses the arm with the highest index. It turns out that to calculate the index 
of an arm it is sufficient to consider an auxiliary problem with two arms: the arm 
for which we calculate the index, and an arm that yields a constant payoff. The 
literature therefore focuses on such problems, called two-armed bandit problems. 

Once the optimality of the index strategy is guaranteed, one looks for the relation 
between that data of the game and the index. Explicit formulas for the index when 
the payoff is one of two distributions that have a simple form has been established 
in the literature. Berry and Friestedt (1985) provide the solution to the problem in 
discrete-time, e.g., when the payoff distribution is one of two Bernoulli distributions, 
and in continuous-time, e.g., when the payoff distribution is one of two Brownian 
motions. In continuous-time, by studying the dynamic programming equation that 
describes the problem, Keller, Rady and Cripps (2005) and Keller and Rady (2008) 
provided an explicit form for the index when the payoff's distribution is Poisson|j 
When the payoff distribution is known, Karatzas (1984) characterized the index 
when the payoff's distribution is a diffusion process, and Kaspi and Mandelbaum 
(1995) characterized the index when the payoff's distribution is a Levy process, and 
they obtained an explicit form for the index for special distributions. 

In practice, payoff processes have a complex form, exhibiting both small random 
changes, that can be modeled by a Brownian motion, and large shocks that can be 
modeled as arriving at a Poisson rate. A stochastic process that incorporates these 
two types of changes is the Levy process. In fact, Carr and Wu (2004) argue that 
almost all economic phenomena can be described by time shifts of Levy processes. 
Therefore, it is desirable to study the bandit problem where the payoff distribution 
is one of finitely many Levy distributions. 

In the present paper we provide an explicit solution to the two-armed bandit 
problem where the payoff distribution is one of two Levy processes. We assume 
that one distribution, called High, dominates the other, called Low, in a strong 
sense (see Assumption I2.ll below). To eliminate trivial cases, we assume that the 
expected payoff that is generated by the safe arm is lower than the expected payoff 
generated by the High distribution, and higher than the expected payoff generated 
by the Low distribution. 

In such a case in discrete-time, the optimal strategy is a cut-off strategy: the 
DM keeps on experimenting as long as the posterior belief that the distribution is 



These authors also studied the strategic setup, in which several DMs have the same set of 
arms, and their arms' payoff distributions are the same (and unknown), and they compared the 
cooperative solution to the non-cooperative solution. 
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High is higher than some cut-off point, and, once the posterior probability that the 
distribution is High falls below the cut-off point, the DM switches to the safe arm. 
We prove that when the two payoff distributions are Levy processes that satisfy 
several requirements, the optimal strategy is a cut-off strategy, and we provide an 
explicit expression for the cut-off point, in terms of the data of the problem. When 
particularized to the models studied by Kaspi and Mandelbaum (1995), Bolton 
and Harris (1999), Keller, Rady and Cripps (2005) and Keller and Rady (2008), 
our expression reduces to the expressions that they obtained. 

Apart of unifying previous results, our characterization shows that the special 
form of the optimal payoff derived by Bolton and Harris (1999) and Keller, Rady 
and Cripps (2005) is valid in a general setup: the optimal payoff is the sum of the 
expected payoff if no information is available, and of an option value, that measures 
the expected gain from the ability to experiment. It also shows that the data of the 
problem can be divided into information-relevant parameters and payoff-relevant 
parameters; the information-relevant parameters can be summarized in a single real 
number, and the payoff-relevant parameters are the expectations of the processes 
that contribute to the DM's payoff. Finally, the characterization allows one to 
derive comparative statics on the optimal cut-off and payoff. For example, as the 
discount rate increases, or the signals become less informative, the cut-off point 
increases but the DM's optimal payoff decreases. 

It is often the case that the DM holds an incorrect prior, and plays optimally 
given that prior (for further discussion and literature review, see Section HO|) . We 
provide an explicit expression for the value function in this case. Our technique 
provides a new description of the optimal strategy with a time dependent cut-off. 

So far we have assumed that all the information that the DM has is the payoff 
process. Sometimes, the DM has information that does not contribute to the payoff 
process, yet it helps him learn the type of risky arm. For example, scientific discov- 
eries made by other firms in other markets may shed light on the appropriateness 
of a given technology to a product that a firm develops. In addition, part of the 
payoff of the DM may not be observed by the DM. 

In Section ^. 5l we study a bandit problem, in which the risky arm generates three 
Levy processes - the first is observed by the DM and contributes to the payoff, the 
second is observed by the DM and does not contribute to the payoff, and the third 
is not observed by the DM but it contributes to the payoff. We provide an explicit 
expression for the cut-off and for the optimal payoff of the DM. That generalizes the 
expression we found when the risky arm generates only one process. This analysis 
clarifies the distinction between information-relevant data and payoff-relevant data. 

We conclude the paper by applying our characterization to compare the effects 
of optimism and pessimism in bandit problems. A DM is called optimist if his prior 
probability that the payoff's distribution is High is higher than the true probability, 
and he is called a pessimist if his prior probability that the payoff's distribution 
is High is lower than the true probability. Using our characterization we find that 
unless the pessimist assigns high probability to the High type and the two DM's 
are sufficiently patient, an optimist will fare better than a pessimist. 

The rest of the paper is arranged as follows. The model and the main results 
appear in Section 2. Directions for future research appear in Section 3. All proofs 
appear in Section 4. 
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2. THE MODEL AND THE MAIN RESULTS 

2.1. Reminder about Levy Processes. Levy processes are the continuous-time 
analog of discrete-time random walks with i.i.d. increments. A Levy process 
X = (X(t)) t >Q is a continuous-time stochastic process that (a) starts at the origin: 
X(0) = 0, (b) admits cadlag modification!! and (c) has stationary independent 
increments. See figures 1 and 2 for a generic path of a Levy process. A few exam- 
ples of Levy processes are a Brownian motion, a Poisson process, and a compound 
Poisson process. The latter is a continuous-time process in which jumps arrive 
according to a Poisson process and the jumps are i.i.dH 

We now present the Levy-Ito decomposition of Levy processes. Let (X(t)) be a 
Levy process. For every Borel measurable set A C R\{0}, define: 

v{A) := E [(J{0 < s < l\AX(t) := X(t) - X(t-) G A}} . 

This is the expected number of jumps with size in A that occurs up to time 1. 
One can verify that v is a measure, called the Levy measure of the process. The 
quantity ^(R\{0}) is the expected number of jumps that occur up to time 1. If 
^(R\{0}) = oo, then the number of jumps in the time interval [0, 1], and therefore 
in any compact time interval, is infinite a.s., and we say that the Levy measure is 
infinite. If ^(M\{0}) < oo, then the expected number of jumps in any compact time 
interval is finite a.s., and we say that the Levy measure is finite. In this paper we 
study Levy processes that have finite Levy measures. The results can be generalized 
for Levy processes with infinite Levy measures, see Cohen and Solan (2009). 

The Levy-Ito decomposition (see Applebaum (2004)) states that every Levy 
process with finite Levy measure can be represented as follows: 

(2.1) X{t) = iit + aZ{t)+L u {t), 

where fit is a linear drift, o~Z(t) is a Brownian motion with standard deviation a, 
and L v (t) is a compound Poisson process with Levy measure v which is indepen- 
dent of aZ(t): jumps arrive at a Poisson rate with expectation ^(R\{0}), and the 
distribution of each jump is given by the distribution function T^g^jyy- 

2.2. Levy Bandits. A DM operates a two-armed bandit machine in continuous- 
time, with a safe arm that yields a constant payoff s, and a risky arm that yields 
a stochastic payoff (X(t)). The risky arm can be of two types, High or Low. 
We denote the arm's type by 9: if the type is High (resp. Low) we set 9 = Q\ 
(resp. 6*2). If 6 = 6i, i G {1,2}, the risky arm yields payoff (Xi(t)), which is a 
Levy process. We assume throughout that the Levy measures of both (Xi(t)) and 
(X2(t)) are finite, and therefore a.s. there are only finitely many jumps in each 
compact time interval. Denote the Levy-Ito decomposition of (Xi(t)), i G {1,2}, 
by Xi{t) = + o-iZ(t) + L Vi (t). 



That is, it is continuous from the right, and has limits from the left: for every to, the limit 
X(t() — ) := lim X(t) exists a.s. and X(tg) = lim X(t). 

3 Formally, let A > 0, and let D be a distribution over M\{0}. A compound Poisson process with 

AT(t) 

rate A and jump size distribution D is a continuous-time stochastic process given by X(t) = y] Dj, 

i = l 

where N(t) is a Poisson process with rate A, and Di are i.i.d. random variables, with distribution 
function D, which are also independent of (A r (t)) t >g. 
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Set Vi := Vi(M. \ {0}), and denote by Hi := J hvi(dh) / Vi the expected jump size 
of (A"i(t))Q We assume that Hi is finite. The quantity J hvi(dh) = ViHi is the 
contribution of the compound Poisson process to the instantaneous payoff. The 
expectation of the risky arm at time t = 1 if 6 = 6i is gi := E[Xi(l)] = ViHi + fii. 

Throughout we make the following assumption, which states that the High type 
is better than the Low type in a strong sense. 

Assumption 2.1. 

Al. — oo < (72 < s < gi < oo. 
A2. ax = (7 2 . 

A3. For every A € B(R\ {0}), u 2 (A) < V\{A) < oo. 

Assumptions Al and A2 rule out trivial cases. Assumption Al merely says that 
the High (rcsp. Low) type provides higher (resp. lower) expected payoff than the 
safe arm. Assumption A2 states that the Brownian motion component of both the 
High type and Low type have the same standard deviation. Otherwise, since the 
realized path reveals the standard deviation, the DM can distinguish between the 
arms in any infinitesimal time interval. 

The third part of the assumption is less innocuous; it requires that the Levy 
measure of the High type will dominate the Levy measure of the Low type in a 
strong sense: roughly, jumps of any size h occur more often (or at the same rate) 
under the high type than under the Low type. 

A consequence of this assumption is that jumps always provide good news, and 
(weakly) increase the posterior probability of the High type. 

At each time instance t, the DM chooses the proportion of time to devote to 
each of the two arms. If he chooses to devote a proportion k of the current time 
instance to the risky arm, then the instantaneous payoff dY k is the sum of several 
terms: 

• (1 — k)sdt, which is the contribution of the safe arm; 

• k/j,idt, which is the contribution of the linear drift; 

• y/kadZ(t), which is the contribution of the Brownian motion^ 

• kHidt, which is the contribution of the compound Poisson process. 

A strategy k is a (measurable) function, that assigns to each history a number in 
the interval [0, 1], that is interpreted as the amount of time in the interval [t,t + dt) 
devoted to the risky arm. In continuous time, it is usually assumed that a strategy 
is predictable, that is, to determine the behavior at time i, it is sufficient to know 
the history strictly before time t. Formally, Kt is T\_ -measurable, where I(t) := 
j^re" rt dY K (t) is the stochastic discounted payoff using the strategy n, and T\_ 
is the cr-algebra generated by the stochastic process (Ht)) of the discounted payoff 
with discount rate r, up to (excluding) time t. 

As is well known , in continuous-time the play path need not be uniquely defined, 
and therefore one should be careful in defining the set of strategies available to the 
DM, and the play path that a strategy defines. We circumvent this issue by arguing 
that an optimal strategy must solve a certain Functional Differential Equation 



4 In order to simplify notation, we denote f hisi(dh) := f hvi(dh). 

R\{0} 

^By devoting the proportion k of the time interval [t, t + dt) to the risky arm, the variance 
of the continuous part of the payoff is Ka 2 dt. This explains the scaling parameter \fka. For a 
qualitative explanation for this form, see Bolton and Harris (1999). 



6 



ASAF COHEN AND EILON SOLAN 



(FDE), by showing that this FDE has a unique solution, and by exhibiting this 
solution. As for discrete-time, this solution turns out to be a cut-off strategy, 
which we now define. 

Let pt := P(9 — be the posterior belief at time t that the risky arm is 

High. A strategy k is a cut-off strategy with cut-off point p if the DM chooses the 
safe arm (with probability 1) whenever p t < p, and the risky arm (with probability 
I ) whenever p t > p. We now argue that Assumption A3 guarantees that the play 
under a cut-off strategy is well defined. Suppose that po > p. Then the DM chooses 
the risky arm until the first time t that satisfies pt — p. Since the Levy payoff 
processes have finite measures, Assumption A3 implies that in an infinitesimal 
interval after time t the posterior belief will drop below the cut-off. Indeed, the 
first time t satisfying p t = p is a predictable stopping time, and therefore, the 
probability that a jump occurs atthat time is zero (see Bertoin (1996)). Therefore, 
P(9 = 9i\!Fj.) = P(9 = 9i\Tj._) = p. If there is a Brownian motion component, its 
fluctuations will cause the posterior to drop below p in an infinitesimal time interval 
after time t. If there is no Brownian motion component, since the Levy measure 
is finite, there are no jumps in an infinitesimal time interval, and by Assumption 
A3 the compound Poisson process will cause the posterior to drop below p. This 
implies that the play under a cut-off strategy is well defined. 



2.3. The Optimal Strategy. The expected discounted payoff under a strategy k 
when the prior is po = p is 



V K (p) = E 
= pE 



re' rt dY K {t) 
re~ rt dY K (t) 



(l-p)E 



re~ rt dY K {t) 



Let U(p) — supV K (p) be the maximal payoff the DM can achieve. As we show 

below, the DM has an optimal strategy, so in fact the supremum in the definition 
of U(p) is achieved. The function V K (p) is linear with respect to p, and therefore 
U(p), as the supremum of linear functions, is convex. By always choosing the safe 
arm, the DM can achieve at least s; since U(0) = s, the convexity of U(p) implies 
that U is non-decreasing. 

Proposition 2.2. U(p) is monotone non- decreasing, convex, and continuous in p. 

It follows from Proposition 1 2 . 21 that there is p* such that U(p) — s if p < p* and 
U(p) > s otherwise, so that the strategy k = that always chooses the safe arm is 
optimal for prior beliefs in [0,p*]. 

Our first theorem states that there is a unique optimal strategy, which is a cut-off 
strategy. Moreover it provides the exact cut-off point and the corresponding ex- 
pected payoff in terms of the data of the problem. Let a be the unique solution 
of 

(2.2) f( V ) := J V2 {dh) ^^y +?7 ( J > 1 --^)-P 2 + I(r,+ l)^^-^) 2 -r = 
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in (0, oo). The existence and uniqueness of such a solution are proved in Lemma 14.41 
below. Observe that ^ ji^j , the Radon-Nikodym derivative, exists by Assumption 
A30 

Theorem 2.3. Denote p* := -, — , n,°^~^ , v Under Assumptions A1-A3, the 

unique optimal strategy is 

[1 ifp>p*. 
The expected payoff under k* is 

(2.3) u(p)=v K .(p) = \ a x tfp ^ p *: 

[P9i + 0--PW + C a (l -p){-f-) a %fp > p*, 

mhprp C — s -92-P* (91-92) 

where C a - ^.^y ■ 

The term pgi + (1 — p)g 2 in (|2.3[) is the expected payoff for a DM choosing only 
the risky arm. Thus, C Q (1 — p)(^—^) a is the option value for the ability to switch to 
the safe arm. The quantity a summarizes all the information-relevant parameters 
of the problem (see also Section [2~5|) . Apart from that, the parameters of the payoff 
processes that determine the cut-off point p* and the optimal payoff U (p) are the 
expected payoffs g\ and g 2 . 

The function in (|2.3[) has the same structure as the solution of Bolton and 
Harris (1999) and Keller, Rady and Cripps (2005) for the one agent problem. 
In Bolton and Harris (1999), the only component in the risky arm is the Brow- 
nian motion with drift. Therefore Vi = 0, so that gi = fi\, g 2 — fJ-2, and 
a = (-1 + v /l + 8rcr 2 /(Mi-M2) 2 )/2. In Keller, Rady and Cripps (2005), the 
risky arm is either the constant zero (Low type, so that v 2 = 0), or yields a pay- 
off h according to a Poisson process with rate A (High type). If the risky arm is 
High, the only component in the Levy-Ito decomposition is the compound Poisson 
component, and vi(h) = X and zero otherwise. Therefore, gi = Xh, g 2 = 0, and 
a = rj X. 

The explicit form of the cut-off point p* and of the value function U allows us 
to derive simple comparative statics. As is well known, a DM who plays optimally 
switches to the safe arm later than a myopic DM, and indeed, p* is smaller than 
the myopic cut-off point p m := g~Jg 2 ■ Furthermore, the cut-off point p* is an 
increasing function of a. As can be expected, a (and therefore also p*) increases 
at the discount rate r and at v 2 (dh), and it decreases at v\(dh) and at \fi± — JU2 1 : 
the DM switches to the safe arm earlier as the discount rate increases, as jumps 
provide less information, or as the difference between the drifts of the two types 
increases^ Furthermore, as long as p > p* , the value function U{p) decreases in 
a. Thus, decreasing the discount rate, increasing the informativeness of the jumps 
and the difference between the drifts is beneficial to the DM. 



6 To ensure the existence of the Radon-Nikodym derivative, and therefore the form of the 
solution, one does not need the full power of Assumption A3. Its full power will be used within 
the proof. 

^Moreover, a(r = 0) = and a(r = 00) = 00. 
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In Section 12.41 we extend the results to the case that the prior belief of the DM 
is not the true prior po- In Sections 12.61 and 12.51 we provide two applications to our 
techniques and results. 



Remark 2.4. Incomplete information of pq. Suppose that the DM does not 
know the prior belief po, but rather has some belief if over pq. That is, po is 
chosen at the outset according to tp, and is not told to the DM. From the DM's 
point of view, the situation is equivalent to an auxiliary problem in which the prior 
probability of the High type is the expectation of the corresponding probability in 
the original problem, and therefore Theorem 12.31 provides the optimal strategy in 
this case as well. 



2.4. The Payoff with Incorrect Prior. In decision problems it is usually as- 
sumed that the DM either knows the true state of nature, or has some prior distri- 
bution over the set of states of nature. Experiments show that the prior distribution 
that DMs have is often different than the true prior. The phenomenon of overcon- 
fidence - assigning too high a probability to the good state of nature - has been 
observed in various areas (Svenson (1981), Baumhart (1968), Larwood and Whit- 
taker (1977), Cross (1977), Weinstein (1980), Camerer and Lovallo (1999)). Bab- 
cock and Loewenstein (1997) argue that biases in bargaining may be self-serving, 
and Hcifctz, Shannon and Spiegel (2007) show that biases of preferences may be 
stable. 

In every decision problem, a DM who correctly perceives the prior distribution 
will fare better than a DM who has some bias, and believes that the prior dis- 
tribution is different than the correct one. Indeed, an optimal strategy of a DM 
who correctly perceives the prior distribution is a strategy that yields the highest 
possible gains for this prior distribution, so it yields at least as much as any other 
strategy, in particular, optimal strategies for incorrect prior distributions. 

Denote the initial belief of the DM for the High type by go, and suppose that 
it may be different from the true probability pq. By Theorem 12.31 the optimal 
strategy of the DM is a cut-off strategy, however, since he has an incorrect prior, 
he does not switch to the safe arm at the optimal time. In this section we give 
an exact formula for the payoff, assuming the DM plays optimally given his belief. 
We will also describe the optimal strategy from a different point of view, not as a 
cut-off strategy. This point of view is arguably closer to the way people perceive 
the decision problem that the DM faces. 

Suppose that until time t, the DM chose the risky arm, and observed the 
jumps hi, ...h n from the compound Poisson component of the payoff process. Let 
Yg(t) be the Brownian motion with drift component of the payoff process from 
the risky arm at time t. Note that ~ N(^,t,cr 2 t). The posterior belief 
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qt :=P t (9 1 \h 1 ,...h n ;Y^(t);q ) of the DM is@ 
(2.4) 

(^B<t)-Mlt) 2 



qo- 



9o e 



Indeed, ^ 2 ^. to . e is the probability of receiving the payoff Yg (t) , given the 

type 9i, and e~ Vit FJ t _ ^ J - is the probability of receiving the n jumps that 
occurred until time t, given the type 0j. The first equality in (|2.4p is the Bayesian 
belief updating, using the independence of the components in the Levy-Ito decom- 
position, given the type of the risky arm, and the second equality is obtained by 
eliminating common components. For a generic path of the process (qt), see figure 
3. 



Suppose the DM follows a cut-off strategy k' with cut-off point p'\ 
If if p>p'. 

If qo < p', the DM will always choose the safe arm. If qo > p' , the DM will 
initially choose the risky arm. The DM chooses the risky arm as long as qt > p' , 
which, by Eq. (|2.4[) . is equivalent to: 

qoe HiYl i (t)/o*-tft/2<r > -9 1 t ^ ^(dh) ^ 
(2-5) — n „„vlmW_„2 f /,„2_,-,„ f 11 TT7T > 



P 



(1 - qa ) e H2Yi(t)/*i- f ,lt/2^~v 2 t 11 V2 (dh) 



1 



P 



Without loss of generality assume that fXi— fi2 > 0; the case /ii — fi2 < is handled 
similarly, and provides the same results. By taking the natural logarithm, and 
rearranging the resulting terms, we obtain that this inequality is equivalent to: 

(2.6) 



~Yh{t) > 



Ml + M2 



2cr 



Mi - M2 



E 1 * 



ajyi ~ v i) 
Mi - M2 
v\ (dhj ) 
v<2,(dhj) 



t 



Mi - M2 



In 



f - go 



in 



p 



i-pf 



The right-hand side in 
the slope F := ( lil jp-) 



is a piecewise linear function F ■ t — E — Gt oft, where 
^ Vl ~ v ^ is independent of t, the intercept at t = is 



E := —2— x 

by G h := 



In 
In 



l-<?o 



, and Gt := 



Denote 



the contribution of a jump of size ft, to the intercept. 



^Flt- Ui{dhj) is the product of Ui(dhj) over all jumps hj that occur up to time t— . Similarly, 



we use the notation Ylt— f° r * ne sum over all jumps up to time t- 
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From Eq. (12. 6|) we obtain the following alternative description of the optimal 
strategy: The DM has a time-dependent cut-off which is piecewise linear. The 
slope of the cut-off function is always F, and whenever there is a jump of size h, 
the cut-off decreases by Gh (see Figure 4). The DM chooses the risky arm as long 
as his current payoff from the continuous part of the Levy process, Yg(t), exceeds 
the cut-off. 

Thus, at first the DM plays until the payoff from the continuous part divided 
by the standard deviation satisfies: < F ■ t — E: if a jump of size h occurs 

before he switches to the safe arm, then the intercept decreases by G/j, and this 
behavior repeats itself. If there is no Brownian motion component, then Yg(i) = 
and F = 0; the DM chooses the risky arm for a fixed amount of time, and then 
switches to the safe arm, unless a jump occurs before the switch; if a jump occurred, 
the amount of time to choose the risky arm increases, as a function of ■ 

Theorem 2.5. Under Assumption \2.1l for every po G [0,1], the payoff U(po, qo) 
of a DM who uses a cut-off strategy with cut-off point p' , is as follows^ if qo > p' > 

+ 9iPo +52(1 ~Po), 
while U(p , q Q ) = s if q < p' . 

One can verify that when the DM holds the correct prior, and plays according 
to the optimal strategy «*, Eq. (|2.7j) coincides with (|2.3|) : U(po,po) — U(po). Note 
that U(po,qo) is continuous in all its parameters. 

To calculate the payoff of a DM who plays optimally but has an incorrect prior, 
simply substitute the expression for p* from Theorem 12.31 as p' in Theorem 12.51 

2.5. Information and Payoff. Theorems 12.31 and 12 . 51 express the optimal cut-off, 
the optimal expected payoff, and the expected payoff for a DM with incorrect prior 
who uses a cut-off strategy in terms of the expected payoff of each arm, and the 
quantity a, which captures all the information-relevant parameters of the payoff 
processes. In this section we justify why a indeed captures all the information- 
relevant parameters. This justification explains why the solutions of Bolton and 
Harris (1995) and Keller, Rady and Cripps (2005) have the same structure as our 
solution. Suppose the DM faces a two-armed bandit problem with Levy payoffs. If 
the risky arm's type is High (resp. Low), it yields three independent Levy processes 
(Xf(t)), M*)), (Xi(t)) (resp. (Xf(i)), (X b 2 (t)), (X%(t)))- 

As in Section [2~2l the notation a\ , v\, and gf, represents the drift of the pro- 
cess (Xf (t)), the standard deviation of the Brownian motion component of (Xf (t)), 
the Levy measure of the process (Xf (t)), and the expectation of the process (Xf (t)) 
at time t = 1, respectively, where j G {a, 6, c}, and i G {1, 2} is the arm's type. 

We assume that Assumption 12.11 is satisfied for the three couples ((Xf(t)), 
(X J 2 (t))),j£{a,b,c}. 



We omit the dependence of U on p', s, gx, g%. Recall that a is the unique solution of the 
equation /(?}) = (see Eq. ^4.40 1. 
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Suppose that the DM's payoff is Xf + Xf, while his information is (Xf,X b ). 
Thus, the component Xf represents observed information of the DM that con- 
tributes to his payoff, the component X° represents observed information that does 
not contribute to the payoff, and the component Xf represents unobserved infor- 
mation that contributes to the payoff. The following theorem characterizes the 
optimal strategy of the DM in this setup, as well as his expected payoff when he 
holds a possibly incorrect prior. This characterization can be used to calculate the 
fair price of additional information. 

Let (3 be the unique solution of 



(2.8) 

fa,b(v) ■ = 

in (0, oo). 



v i v 2(dh a ) y 



v b (dh b 



vl(dh a ) J 

4(dh b ) V 
4(dh b ) J 



\(v 



1)77 



Mi - M2 



i 



*2 + 2^ +1 )^ 



- r = 



Theorem 2.6. The expected payoff to a DM who holds the prior belief qo, uses 
a cut-off strategy k' with cut-off point p' , receives and observes the payoff process 
(X a (t)), observes but does not receive the payoff process (X b (t)), and receives but 
does not observe the payoff process (X c (t)), is as follows: if qa > p' 

\ /9+1 / i \ 0+1 

%Y ( je_ 

q J u-y 



ya, 

v' 



1 



+ (s~92 - .9 2 C )(l-Po) 



1 - go 
90 



1 -p' 



■Po(fli +fli) + (l-PoM +92), 



while V®, 



ipo,Qo) = sifq < p'. 
Moreover, the optimal cut-off for a DM who holds the correct prior, i.e. qo = po, 
is given by 

P{s - g a 2 - g c 2 ) 



P 



{f] + l){g a 1 +g c 1 -s) + [3{s-g--g c 2 y 



Observe that if there are no processes (X b ,Xf), then Theorem 12.61 reduces to 
Theorem 12.51 Moreover, the parameter [3 in Theorem 12.61 is the equivalent of a in 
Theorem [231 

The characterization in Theorem 1 2 . 61 shows that f3 incorporates all the data that 
is relevant to the information that the DM has. It depends on the parameters of 
(Xf,Xf), which the DM observes, but not on the parameters of (Xf), which the 
DM does not observe. Moreover, it depends on the absolute value of the difference 
between the drifts, \fi± — /X2I, the standard deviation of the Brownian motion, a, 
the Radon- Nikodym derivative ^Ig^j , and the average nvi + (1 — T\)u%. These 
quantities help in distinguishing between the two types of the Risky arm. Once the 
information-relevant parameters are summarized in f3, the only relevant parameters, 
which affect both the optimal cut-off and the optimal expected payoff, are the 
expectations of the payoff- relevant processes (gf,gf)- 
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Theorem 12.61 can be used to find the fair price of the additional information X h 
and of X c . That is, the price of information that does not affect the DM's payoff, 
as well as the price of information that affects the DM's payoff. This is done by 
comparing the optimal value of two bandit problems that differ in the information 
of the DM. 

The next corollary to Theorem l2.6l states that, using cut-off strategies, additional 
information is more profitable^ 

Corollary 2.7. Suppose that there are two decision makers, DM1 and DM2, who 
hold the correct prior, i.e. qo = po, and use cut-off strategies. DM1 receives and 
observes the payoff process (X a (t)), observes but does not receive the payoff process 
(X b (t)), and receives but does not observe the payoff process (X c (t)). DM2 receives 
and observes the payoff process (X a {t)), and receives but does not observe the payoff 
process (X c (t)). Then the optimal expected payoff of DM1 is higher than the optimal 
expected payoff of DM2. 

2.6. Optimism vs Pessimism. As mentioned before, the phenomenon of over 
confidence is common in many decision problems. In this section we apply the 
result of Section 12. 4\ and investigate who will fare better in two-armed bandit 
problems, an optimist who assigns a probability higher than the true probability 
to the High type, or a pessimist who assigns a probability lower than the true 
probability to the High type. 

Suppose that there are two decision makers, DM1 and DM2, who face indepen- 
dent identical copies of the decision problem. DM1 is an optimist, and believes 
that the probability of High is po + p, where p > 0, while DM2 is a pessimist, 
and believes that the probability of High is po — p, and both play optimally given 
their beliefs. If po — p < p* , the pessimist will always choose the safe arm, since 
according to his subjective belief the prior is at most the cut-off. Assume then that 
Po — P > P* ■ For every e G [p* — pa, 1 — po] denote by V Po (e) the expected payoff 
for a DM playing optimally according to the incorrect prior pa + e, where e may be 
negative. It turns out that the answer regarding who will fare better, an optimist 
or a pessimist, depends on a that is defined in Eq. (|2.2[) . 

Theorem 2.8. Assume that Assumption \2.1\ holds. 

1. If a > 1, then for every e > such thatp* < p$±e < 1 we have V Po (e) > V Po (~ e): 
an optimist will fare better. 

2. If < a < 1, then for every e > such that p* < po ± e < ^-j^ we 
have Vp a (e) > V Po (—e): an optimist will fare better; for every e > such that 
^4^- < po ± e < 1 we have V Po (e) < V Po (—e): a pessimist will fare better. 

Thus, an optimist will fare better than a pessimist, unless a is low and po — e is 
high. That is, a pessimist will fare better only if the following two conditions are 
satisfied: 

1) The pessimist assigns high probability to the High type. 

2) The two DM's are sufficiently patient, or it is easy to distinguish between the 



■'■"it is well known that in one-player optimization problems, additional information cannot 
hurt the DM, since he can always ignore it. However, a Markovian cut-off strategy does not allow 
a player to forget additional information, and therefore the statement is not trivial. 
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two types of the risky arm. 
Otherwise, an optimist will fare better. 

Since p* = -, — , 1 w°^^ g2 ' ) < v, the condition p* < is not always satisfied. 

r {a+l){gi—s)+a{s—g2) ' ? 3 J 

If 3gi + g 2 > 4s, then p* < 2±2. If 3gi + g 2 < 4s, then p* < ^±2 if an d only if 

4s-3gi-c/ 2 -V (4s-3gi-9 2 ) 2 -8(gi-g 2 )(si-s) _ 4s-3gi-g 2 +\/ (4s-3gi-g 2 ) 2 -8(gi-g 2 )((/i-s) 

a < Y tt i or a > ^ ; 

2(Sl-g 2 ) 2(3i -g 2 ) 



We obtain the same result if the biases of the optimist and the pessimist are 
geometric rather than absolute, that is, if the initial prior of the optimist is (1 + 
e)po = pa + epo and the initial prior of the pessimist is (1 — e)po = po — epo. This 
is true since the absolute bias of the optimist and the pessimist is the same: epo- 

3. Future directions 

Our results call for further research. We here list few possible directions for 
future research. 

• We studied the case that the distribution of the High type dominates the 
distribution of the Low type in a strong sense (see Assumptions 12 . 1 [) . These 
assumptions ensure that the discontinuities of the process of posterior belief 
are always to one direction. It would be interesting to know whether a 
similar characterization holds under only assumptions Al and A2. 

• We assumed that the payoff is distributed according to a Levy process. 
It would be interesting to solve the model when the payoff is distributed 
according to a geometric Levy process. 

• Bolton and Harris (1999), Keller, Rady and Cripps (2004), and Keller and 
Rady (2008) study a strategic version of the model, in which several de- 
cision makers face identical unknown arms. Klein and Rady (2008) study 
a strategic version of the model, in which the arms of two DMs are neg- 
atively correlated. It will be interesting to solve these models, when the 
payoff distributions are general Levy processes. 

4. Proofs 

4.1. Proof of Proposition [2721 We already argued that U is non-decreasing and 
convex. It is left to prove that it is continuous. Since U(p) is convex, it is continuous 
on (0, 1). If the DM had known the true type of the arm, his optimal payoff would 
have beenp5i + (l— p)s. Since this information is not available, U(p) < pgx+(l—p)s. 

Since U(p) > s for every p, this implies \imU(p) = s = U(0). Since the DM can 

p— >o 

follow the strategy that always selects the risky arm, U(p) > pg\ + (1 — p)g 2 - This 

implies that limUOp) = g\ = U(l). Thus, U is continuous on [0, 1]. 
P ->1 

4.2. The Control Equation. In this section we express the optimal payoff U(p) 
using the dynamic programming principle. This representation extends that in 
Bolton and Harris (1999) to the more general setup of Levy payoff processes, we 
start by calculating how the belief of the DM is updated given his observations. 
To simplify notation, it is convenient to divide various expressions by the standard 
deviation a; set fi\ = p, 2 = ^ and fi — £, and denote dY^ := -^L-dYg = 
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\/kfidt + dZ(t), modulo the factor i/ko~, dYg is the contribution to the payoff of the 
linear drift and the Brownian motion. Then the probability of obtaining a specific 
observation (that is not a jump) given the type is: 

P{dY*\jl) = P{dY%\ii) = -j=e B ~^r* = C • eV^&B--*-^- 

v 2-Kdt 

= C(l + y/kfufrjj + o{{dt) 2 )), 
where C is a constant independent of \x. Denote P(dYg\fi) := 1 + \fkfidYg. 

Let p = Pt{6i) be the belief at time t that the risky arm is High. Then: 

p (f)) = PjdY^PjdY^p 

t+dt{ lj p(dy*|ei)P(dy*|^)p+p(dy*|fla)P(dy*|fl a )(i-p) 

PidY^PidY^p 



pidY^PidY^p + p(dY*\e 2 )p(dY£\e 2 )(i - p) 



PidY^PidY^p 



PidY^PidY^p + PidY^PidY^il-p) 



This is the Bayesian belief updating, using the independence of the components 
in the Levy-Ito decomposition, given the risky-arm type 9. The next lemma ex- 
presses the change in the posterior belief over time. The lemma handles separately 
the case where there are no jumps in the time interval [t, t + dt), and the case where 
there is a jump in this interval. 

Lemma 4.1. 

1. Suppose that there are no jumps during the time interval [t,t + dt). Then: 

(4.1) dP t :=P t+dt (6 1 )-P t (e 1 )=p(l-p)Vk(il 1 -fl 2 )dZ-p{l-p)k{i? 1 -i?2)dt, 

where dZ = dYg — \/k{jpp,\ + (1 —p)fi 2 )dt is a standard Brownian motion. 

2. Suppose that during the interval [t, t + dt) a jump of size h occurred. Then: 

(4.2) P t+d t(0i) =P h + Vk(h - A 2 )Pft(l - Ph)dZ 2: 

where P h := pVl{dh )+^l )Mdh) , and dZ 2 := dTj - Vk(fnP h + ft 2 (l - P h ))dt. 

The first term in the right-hand side of (14. 2|) is the contribution of the continuous 
part of the payoff process to the change in the belief, while the second term is the 
contribution of the fact that no jump arrived. This latter contribution is negative 
due to Assumption A3. If a jump of size h occurs during the interval [i, t + dt), then 
the contribution of the continuous part of the payoff process is (fli — fi 2 )(dY^ — 
\fk(fliPh + fl 2 (l — Ph))dt), and the compound Poisson process' contribution is 
Ph ■= m n(dh)^(i-p)v-2(dh) ■ T ne l atter i s tne Bayesian update of the probability that 
the risky arm is High given that a jump of size h occurred. Note that p < Ph- 

Proof of Lemma \4-l\ The proof of the lemma is standard and non-inspiring. 
The first statement follows from a long chain of equalities. Assume that there is no 
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jump in the interval [t, t + dt) 

dp _ p{i-v)[P{dY*\e l )p(dY*\e 1 ) - P(dY*\e 2 )P(dY*\9 2 )} 

' P(dYk\8 1 )P(dYk\8 1 )p + P(dY*\e 2 )P(dY>S\8 2 )(l - p) 
_ p{l -p)[(l + VkjndY^){e-^ kdt ) - (1 + y/kp, 2 d¥ r Jl)(e- p * kdt )] 
~ (1 + VkjlidY^){e-^ kdt )p + (1 + Vkji 2 dY k ){e-^ kdt ){\ - p) 
_p{l-p)[{\ + VkfridY&O- ~ v x hd£) - (1 + y/kjl 2 dY^){l - 9 2 kdt)} 
~ (1 + v^idisXl - v x kdt)p + (1 + Vkp 2 dY k ){l - D 2 kdt)(l-p) 

p(i - p)[Vfc(/2i - A 2 )^ - (gi - p 2 )fcrft] 

1 + \fk{pfa + (1 - p)jl 2 )dY k - k(pD! + (1 - p)j> 2 )A 
=p(l -p)[Vk{iii - fi 2 )dYg - (v x - v 2 )kdt\ 

■ [1 - Vk(pp,i + (1 - p)£i 2 )dY£ + k(pvi + (1 - p)z/ 2 )<ft + fc(p^i + (1 - p)v 2 fdt\ 
=p(l -p)[Vk(p,! - fi 2 ){dY£ - + (1 -p)fc)dt) - k{v x - v 2 )dt\ 

=p(l - p)\fk{Ji\ — fi 2 )dZ - p(l - p)k{vi — z/ 2 )di. 



In the calculations we used the fact that dZ = dYg — Vk(pfli + (1 — p)jl 2 )dt is 
a standard Brownian motion (see Bolton and Harris (1999)), and the Brownian 
motion properties: dZ 2 = dt, and dZdt = 0. We also ignored terms of order {dt) 3 / 2 
and above. 



We now prove the second statement. 

p , 0) = p Vl {dh){\ + VkfudY*) 

[i^(dft)p + i^(d/i)(l-p)] + v^[^(dft)AiP + ^(d/i)A2(l-p)]dKi 

= P h (l + VkhdY£) 

l + VhifuPh + feil-Ph^dYg 

= P h (l + y/kji x dYg)\\ - Vk(P h jli + (1 - P h )fl 2 )dY* + k(P h ^ + (1 - P h )fi 2 ) 

= P h [l + Vk(fi! - A2)(l - A)(dKj - Vk(fiiPh + M2(l - A))*)] 

= P h + v^(£i - fi 2 )Ph{l - Ph)dZ 2 . 

□ 

We now formulate the control equation that describes the optimal payoff. 



(CE) 

U(p) = max {[(l-fc) S +fc(p(^iHi+/ii)+(l-p)(^ 2 i/ 2 +M 2 ))]r^+e- rd *£[C/(p+dp)]}, 
fee[o,i] 

where k is the control variable. The first term within the maximization is the 
expected instantaneous payoff, and the second term is the discounted expected 
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continuation payoff. The following lemma provides a more convenient form to the 
control equation, in terms of the derivatives of U. 

Lemma 4.2. The following equality holds: 
(CE2) 

U(p) = max {(1 - k)s + k(jp{v x Hi + m) + (1 - p){i> 2 H 2 + fJa)) 

fe£[0,l] 



k (pvi(dh) + (1 - p)v 2 {dh))U 



pv\ (dh) 



pv\{dh) + (1 - p)u 2 (dh) 
- kp(l-p)(vi - u 2 )U'(p) - k{pD x + (l-p)u 2 )U(p) 

+ l -ku"{p)p 2 (i-pf{^-ri 2 f 

Proof of Lemma \4-S\ Since U (p) is convex, U (p) is twice differentiable p-a.s. (in 
the sense of the Lebesgue measure). With probability — kv\dt) + (1 — p)(l — 
kv 2 dt)] there are no jumps in the interval [t,t + dt). In this case, using the Taylor 
expansion of U, and ignoring terms of order (cZt) 3//2 and higher, we obtain that the 
optimal payoff is U(p + dp) — U(p) + U'(p)dp + \U" '(p)(dp) 2 a.s.0 where dp is 
given by the right-hand side of (|4.ip . 



With probability [pkvi(dh)dt + (1 — p)kv 2 (dh)dt] there is a jump of size h, and 
the optimal payoff is U(P h + y/k{pn - fi 2 )P h (l - P h )dZ 2 ) = U{P h ) + U'{P h )dP h + 
W"(P h )dP h , where dP h := y/k^i - fi 2 )P h {l - P h )dZ 2 . 



During the subsequent calculations we use the following Eqs. (14. 3| . (|4.4|) . (|4.5|) 
and (14. 6p . that can be derived from Lemma [4~T1 

(4.3) E[dp] =-kp(l-p)(v! -v 2 )dt. 

This is the expectation of the change in the belief, given that no jump occurred 
during the interval [t, t + dt). 

(4.4) E[dp 2 ] = kp 2 {l - p) 2 {fii - fi 2 ) 2 dt. 

This is the second moment of the change in the belief, given that no jump occurred 
during the interval [t, t + dt). 

(4.5) E[dPh] = Ci • dt. 

This is the expected contribution of the Brownian motion part to the posterior 
belief, given that a jump of size h occurred during the interval [t, t + dt). 

(4.6) E[dP%] = C 2 ■ dt. 

This is the second moment of the contribution of the Brownian motion part to the 
posterior belief, given that a jump of size h occurred during the interval [t, t + dt). 
In Eqs. (14. 5[) and (|4.6p . C\ and C 2 are constants. Using the above notation, we 



^Since we can ignore terms or order (eft) 3 / 2 and higher, it is sufficient to consider the Taylor 
expansion up to the second derivative. 



BANDIT PROBLEMS WITH LEVY PAYOFF PROCESSES 



17 



obtain from (CE): 
(4.7) 

U(p) = max {[(1 - k)s + k(p(9 1 H 1 + (1 - p)(i> 2 H 2 + n 2 ))]rdt 

fee[o,i] 



+ (1 - rdt) 
+U' ' 



kdt 



U 



pv\ (dh) 



pv\(dh) 



pvi(dh) + (1 -p)v 2 (dh) 

pvi (dh) 
pv\(dh) + (1 — p)u 2 (dh) 

[1 - k(pPi + (1 - p)v 2 )dt] 



pv\(dh) + (1 — p)v 2 {dh) 
ddt 



C 2 dt 



(pvi(dh) + (1 — p)v 2 (dh)) 



1. 



U(p) + U'(p)E[dp] + ^U"(p)E[dp 2 ] 



U" 



The second, third and fourth lines in (|4.7p represent the expected continuation 
payoff given a jump of size h occurred during the time interval [i, t + di), and the 
fifth line represents the expected continuation payoff given no jump occurred during 
the time interval [t, t + dt). Using (dt) 2 = 0, several of the terms in (14. 7|) vanish, 
and we obtain: 
(4.8) 

U(p) = max {[(1 - k)s + k(r>{v x Hi + m) + (1 - p)(P 2 H 2 + ^ 2 ))]rdt 
fce[o,i] 



kdt / U 



pvi(dh) 



(pv\(dh) + (1 ~ p)v 2 (dh)) 



p — a.s. 



K pui(dh) + (1 — p)v 2 (dh) 
+ U(p) - kdtp{l-p){D x - 9 2 )U'(p)dt - k(pv x + (1 - p)v 2 )U(p)dt - rU(p)dt 



+ -kU"(p)p 2 (l^ P ) 2 (fi 1 



[i 2 ) 2 dt 



p — a.s. 



Eliminating U(p) from both sides, and dividing by dt, we obtain (CE2) after simple 
algebraic manipulations, as desired. □ 



From Eq. (J4T7J) it follows that the contribution of the continuation payoff given 
that a jump of size h occurred during the time interval [t, t + dt) is 

(4.9) 



kdt 



U 



pvi(dh) 



U" 



pv\(dh) + (1 — p)v 2 (dh) 
pv\ (dh) 



-U' 



pv\(dh) 



C 2 dt 



pvi(dh) + (1 — p)v 2 (dh) 
(p Vl (dh) + (1 - p)v 2 (dh)) 



ddt 



^pv\(dh) + (1 — p)v 2 (dh) 

The parameters of the Brownian motion affect (|4.9[) only through G\ and C 2 , and 
since C\ and C 2 do not appear in Eq. (|4.8p . it follows that if a jump occurs during 
the time interval [t, t + dt), the information from the compound Poisson process has 
more impact than the information of the Brownian motion. 



According to (CE2), the payoff is the maximum over the control variable k of the ex- 
pectation of the current flow payoff [(1 — k)s + k(p(v\Hi + fi\) + (1 — p)(v 2 H 2 + /j, 2 ))} 
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plus the discounted value of the continuation payoff 



k / (pvx{dh) + [l-p)v2 (dh))U 



pv\ (dh) 



pvi{dh) + (1 - p)v 2 (dh) 



- kp{\ - - D 2 )U'{p) + -kU"(p)p 2 (l - p) 2 (/ii - A2) 2 - Kvvi + (1 - P)V2)U(p) 

A solution k to this maximization problem must satisfy: 



(4.10) 




if6(p, ?7)<*-[pfli + (l-p)fla], 
if b{p, U) = s- [pgi + (1 - p)g 2 ], 
if 6(p, U) > s- \pg 1 + {l-p)g 2 ], 



where 



Hp, u) = - 



(pv^dh) + (1 - p)v 2 (dh))U 



pv\{dh) 



pui(dh) + (1 — p)v 2 (dh) 
-p(l -p)fa - W(p) - (pPi + (1 -p)ih)U{p) 
1, 



+ ^C/"(p)p 2 (l-p) 2 (Ai-A 2 ) 2 

The function within the maximization in (CE2) is linear in fc. Therefore, it achieves 
its maximum at k = 1 or k = 0, p-a.s. From Proposition ^. 21 U(p) is non-decreasing 
and continuous, and therefore there is p* such that U(p) = s for every p < p* . Thus 
k = is optimal for p < p* . For every p > p*, we have £/(p) > s, so that in this 
case fc = 1 is optimal p-a. s0 



4.3. Characterizing the optimal strategy and the value. When it is optimal 
to play safe, that is, when the optimal solution of (CE) is k* = 0, we have U(p) = s. 
When it is optimal to play risky, that is, when the optimal solution of (CE) is k* = 1, 
it follows from Lemma 14.21 that U (p) solves the following functional differential 
equation: 



(FDE) 



U[p) = p{v x H x + m) + (1 - p)(p 2 H 2 + u 2 ) 
' ' (p^(dh) + (1 - p)u 2 (dh))U 



pvi{dh) 



pvi(dh) + (1 — p)v 2 (dh) 



pil-p)^ - D 2 )U'(p) - (pPi + {l-p)v 2 )U{p) 



+ 5 l7"( P )p a (l-p) 2 (/ii 



a.s. in (p* , 1). 



A solution U(p) for this equation must be smooth (Friedman (1969), p. 56)0 There- 
fore, U(p) satisfies Eq. (FDE) in (p*,l) always, and fe = 1 is optimal in (p*,l). 



12 Recall that Eq. (CE2) is satisfied p-a.s., since U'(p) and U"(p) exist p-a.s. In the next 
Section we show that the optimal strategy is in fact a cut-off strategy. 

To see that the conditions of Friedman (1969) are satisfied, substitute f(p) = f (pui(dh) a - 



pvi (dh) 



(1 - p)y 2 {dh))U { p „ l{dh)+(1 _ p) „ 2[dh) 
so is /, and we get U £ C 2 , as Friedman (1969) requires 



J — U(p)(r + {pu\ + (1 — p)u2)). Since U(p) is continuous, 
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Hence, there is an optimal cut-off strategy n* with cut-off point p* . 

The next lemma suggests one solution to Eq. (FDE). To prove it, substitute the 
expression for U(p) defined in Eq. (|4~TT]) below into Eq. (FDE). Recall that 

f(r,) - J v 2 {dh) (^|^|) ' + V(pi ~V2)-V2 + \(n + l)v(fil - - r 

(see (12.21) ). that a is the unique solution of the equation f(rj) = in (0, oo), and 
that p* and C a were defined in the statement of Theorem [ 



Lemma 4.3. One smooth solution to Eq. (FDE) is 
(4.11) U(p)=pgx + (l-p) 9 2+C a (l-p) 

where a S (0, oo) solves the equation f(rj) = 0. 

In order to see that the function U(p), defined above, actually solves (FDE), we 
use the fact that P h = ^p^^r^p^ ^ P* for evei T V > P* > ^i~a.s., which is 
equivalent to Assumption A3. Thus, the form of our solution crucially depends on 
this assumption. 

In fact, one can verify that 

U( P ) = pgi + (1 - p) 9 2 + C(l - P ) i^y^j + - P) ' 

solves Eq. (FDE), where a is as in the statement of Lemma l4~3l and (3 is the unique 
solutiorO of f(rj) = in (— oo, 0). The following lemma assures that a is well 
defined. 



d. 

Lemma 4.4. The equation f(rj) = admits a unique solution in the interval (0, oo). 
Proof. 

The function / is a continuous function that satisfies /(0) < and /(oo) = oo. To 
show that f(rj) — has a unique solution, it is therefore sufficient to prove that 
/ is increasing in rj. Note that if [i\ ^ fi2, then |(?7 + l)??(Ai — A2) 2 — r — v 2 is 
increasing in 77, and constant otherwise. It remains to prove that if v\ 7^ v 2 (i.e. 

^i(K\{0}) > ^ 2 (M\{0})) , then / u 2 {dh) (fgg)" + v{v x - D 2 ) is increasing in 77. 
Since 



v ^ dh ) ^71^ + v(Mdh) - v 2 {dh)) 
v x (dh) ) 



and 

/ V2 ^-\ + n(u, (dh) - m(dh))] = 0, 



/ 

aw . 



v 2 {dh) ( -^j^jj + viMdh) - v 2 {dh)) 



it is sufficient to prove that for V\ — a.e. h G ^[^j < 1}, 

3/1(77) = v 2 {dh) + il{vi{dh) - v 2 (dh)) 
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is increasing in 77. Now, 

> -u 2 (dh)bx (^|^|) + (Mdh) - i*(dh)) > 0, 

where the first inequality holds since r\ > and by Assumption A3. The second 
inequality holds since — ln(x) + x — 1 > for every x 7^ 1. Therefore, gh(v) is 
increasing, as desired. □ 

We now prove that Eq. (FDE) has a unique solution. 

Lemma 4.5. For every pi < P2, and every ui,u 2 £ K, there is a unique solution 
U(p) satisfying Eq. (FDE) in the interval (pi,p 2 ) with the boundary conditions 
U(j>i) = ui, U{p 2 ) = u 2 . 

Proof. 

Since Eq. (FDE) is a non-homogenous linear equation in U, if there are two solutions 
of Eq. (FDE), then their difference is a solution of the homogenous version of Eq. 
(FDE). To prove the lemma, it is therefore sufficient to fix a solution W of the 
homogenous version of Eq. (FDE) that satisfies W(pi) = W(p2) = and to prove 
that W = 0. 

Suppose that W achieves its maximum at p. Then W'(p) = 0, therefore: 



W(p) 



(PMdh) + (1 - p)v 2 {dh))W ( 



pi>i(dh) 



\pvi{dh) + (1 - p)v2{dh) 



+ -w"(p)p 2 (i-p) 2 (ji 1 - /i 2 ) 2 - (M + (i-p)d 2 )w(p) 



Moreover, since the maximum is achieved at p, W"(p) < 0, simple algebraic 
manipulations imply that: 

pvx (dh) 



(r +pv 1 + (1 -p)v 2 )W{p) = J {p Vl {dh) + (1 -p)v 2 (dh))W 

+ \w"(p)p 2 {l-pf(ii l -i,2) 2 



pv\(dh) + (1 — p)V2{dh) 



< W(p) J {pMdh) + (1 -p)u 2 (dh)) 
= {pv x + (1 - p)P 2 )W(p). 



Since r > we conclude that W(p) = 0. A similar argument shows that the 
minimum of W in (pi,p 2 ) is 0, so that W(p) = on (pi,p 2 ), as desired. □ 

As mentioned before, there is an optimal cut-off strategy with corresponding 
payoff U. Lemmas 14. 3[ 14.41 and 14.51 prove that U is the unique solution of Eq. 
(FDE). We now prove Theorem l2.3l which provides an explicit form to the optimal 
strategy and to the payoff function. 
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Proof of Theorem [Qi 

Recall that k* is the optimal cut-off strategy (with cut-off point p*). To complete 
the proof of the theorem, we provide an explicit expression to p* and to U . To this 
end, we first prove that the right-hand derivative of U at p* is 0. 

Let U' R (p*) = lim u (p'+ s )~ u (p"> be the right derivative of U at the cut-off 

point p* . Since U is convex, U' R (j>*) is well defined. Since U is non-decreasing, 
U' R (p*) > 0. We now prove that U' R (p*) < 0. For every q Q G [0,1], let «(g ) be 
the strategy that plays as re, assuming the prior belief is qo rather then pq. Define 
M K :— J Q re~ rt dY K (t), the discounted payoff under the strategy re. 

lim E [M K , (p , +t) \e] =s V0e {B X ,B 2 }- 

Indeed, as explained before, the posterior belief will drop below po in an infinites- 
imal time interval around 0. Therefore as e goes to 0, the probability that the 
DM stops "quite fast" under n*(p* + e) goes to 1. 



Since re* is the optimal cut-off strategy, and since it is independent of the prior 
belief poi we deduce that for every p € [0, 1], U(p) = V K *( p )(p). Therefore: 



r/,^_ U(p*+e)-U(p*) 



(4.12) U' R (p*) = lim 



o+ e 



= Um W+# + <0 - W)g) 

= lim - [(p* + e)S[M K , (p , +e) |^] +(l-p* - e)E[M K , (p , +e) \9 2 }] 

-p'ElM^^ej] - (1- P *)E[M kHp ,)\0 2 }] 
= lim !- [p*^[M B . (p . +e) |ei] + (l-p*)£[M K , (p , +e) |0 2 ] 

-^[m^.)!^] - (i -^[m^,^]] 
+E[M K , {p , +e) \e 1 ] - E[M K , [p , +t) \e 2 ]} . 

By the optimality of U(p), U(p*) > V K »( p * +e \(p*), and therefore, 
(4.13) 

lim - [p'ElM^+^Bx] + (l-p*)E[M K . (j> . +e) \9 2 ]-p*E[M K . {p , ) \9 1 ] - (1 -p^M^,)^]] 

= lim V K . (p . +e) (p*) - ^.(p.)(p*)] = lim V«-(p-+«)(P*) ~ W] < °> 
and 



(4.14) lim (i?[M K , (p , +e) |0 1 ] - E[M K , {p , +€) \e 2 }) = 0. 



Substituting P~T3")) and (|4~14]) in (MTT^j) we deduce that U' R (p*) < 0. 



^By the same argument we get that V K /(p) is continuous in p' , where k' is a cut-off strategy 
with cut-off point p' . 
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As mentioned in Lemma \2.2\ U(p*) = s, and U(l) = g\. By Lemma [4.5[ the 
unique solution of Eq. (FDE) on (p* , 1] is U(p) = pg 1 + (l-p)g 2 + C a (l-p) . 



By imposing U(p*) 

(1 _ s-g 2 -p* (31-32) 

(l-p*)(^)° ■ 



= s and L^.(p*) = we get p* 
Uniqueness follows by (14. 10p . 



a(s-g 2 ) 

(a+l)(gi-s)+a(s-32) ' 



and 



□ 



4.4. Incorrect Prior. To find the expected discounted payoff for a DM who plays 
according to an incorrect prior go, we present here a condition which is equivalent 
to dM]). Substituting Y^(t) = fit + aZ(t) and p = % in ([2"1>]) . we get: 



In I 



Z(t)+ 2 -MLZJ±pl±i _ t > __J_ l n 

ft follows that the DM selects the risky arm until the first time t is satisfied: 



3o 

1-90 



f^Vjl lnf 



1/1 (dhj) 
U2{dhj) 



(4.15) 

S"(t) := Z(t) + 
1 



2/i — /2i - /t 2 



^1 - ^2 



Ml - M2 



< -- 



Mi - M2 



In 



ft) 



1 - go 



in 



1 -p' 



ll.-i — tin — ^ 



V\ (dhj ) ' 

/'1 - /12 ^ \v 2 {dhj) i 



Proof of Theorem 12.51 If the prior belief of the DM, go, satisfies go > p' then 
there is a bijection relation between _E and go, If Qo < p', then the DM always 
chooses the safe arm, which is equivalent to E = 0. Therefore, we will use the 
notation U(po,E) instead of U(po,qo) when the former is more convenient. We 
now prove that under Assumption 12.11 for every p £ [0, 1] and every E £ [0, 00), 
the payoff of the DM is 

U(p ,E) =p .9i + (l-Po).92 + (s-.9i)Po^ ( ' il ^ 2)(Q+1)£ + (s-ff2)(l-Po)e- ( ' il ^ 2)a£ . 

Using Eq. (|4. 15|) . we construct an integral equation, to find the utility function 
for a DM who has a prior belief go- Let t be the stopping time of the first jump. 
Let T be the first time t satisfying (|4. 1 5|) . The DM chooses the risky arm until 
the stopping time T, and then he switches to the safe arm. The calculations use 
dynamic programming in which the continuation payoff is determined by the time 
of the first jump, t, and the value of the continuous part of the payoff at that time. 



Notation and Formulas. The proof requires computations that rely on some 
results on Brownian motion. In this subsection we provide these results, that are 
derived using Borodin and Salminen (1996) p. 197 - 223. Recall that, p, = £ is 
determined by 0. Notice that B^(t) is a standard Brownian motion with drift 



F u := 



2/i - Mi - Ma 



v\ - v 2 



dx, t = t 



2 

r < T). 



t. Define fi* t)1t<t (x, t) := Pg(B^r) £ 
Mi — M2 J v w " 

This is the probability that the first jump occurs in the interval 

[t, t+dt), and B^ belongs to [a;, x+dx), given a jump occurs before the DM switches 

to the safe arm. 
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Denote 
(4.16) 

Pt,h,x ■= P(0i\t < T,t = t, B»(t) G dx, h) 

_ P(t <T,r = t, B»{t) e dx, h\0i)P{6i) 

" P(t<T,t = t, B»(t) G dx, /i|0i)P(0i) + P(t < T, t = t, B»{t) G dx, ft|0 2 )P(0 2 ) 

PoPei(r < r)/ ( ^ M1(T))T)|T<T (x,t)i/i(dft)/Pi 
PoP(9i(t < r)/ ( % 1(r))T)|r<T (a;,i)i/i(dft)/Pi + (1 -Po)Pe 2 (r < r)/ ( e | M2(T))T)|T<T (ar,t)i/ 2 (dft)/p2 

This is the posterior belief that the type is 9\, given that (a) the first jump that 
occurred in the time interval [t, t + dt) has size h; (b) it occurred before the DM 
switched to the safe arm; and (c) the Brownian motion with drift, P M (t), is in the 
interval [x, x + dx). 

The probability that the DM switches to the safe arm before the first jump ap- 
peared is 



(4.17) P b {t >T)= Pg< inf B"(s) < -E) = e - E ^+y/ 2i> + F '\ 

0<S<T 



The expected discounted payoff from the continuous part of the risky arm, until 
the switching time to the safe arm, given the DM switched before the first jump, is 

(4.18) 





















T > T 


= ^Eg 


( re~ rt dt 
Jo 


T>T 


+ crEg 


[ re' rt dZ t 
Jo 


T>T 



= »Eg[l - e~ rl \t>T}= M (l - E e [e- rl |r > T]) 



The expected discounted payoff from the safe arm, after the switching time to the 
safe arm, given the DM switched before the first jump, is 



(4.19) 



Eg 



j; 



T>T 



sEg [e~ rT \r > T] . 



The expected discounted payoff from the continuous part of the risky arm, until 
the first jump occurs, given the first jump occurred before the switching time, is 



(4.20) 



Eg 



f re- rt dYh{t) 
Jo 



t <T 



= fi(l-E e [e- rT \T<T]). 
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The expression on the right-hand side of (j4. 1 9[) . can be re- written using the following 
list of equalities: 



(4.21) 

P r — rT I ^ T] E e [e- rT ,T>T] Je-^PejTedt^hK^dh 

hg[e \T > 1 = — = — 

Pe(T > T) Pe(r > T) 

J e~ rtl P e (T g dh)P e (h < T)dh _ J e- rtl P e {T Gdt^e-^dh 

Po{t>T) ~ P 6 (t>T) 

~ P 9 (t > T) ~ P e {r > T) 

= P e (T > T) = P 9 (t > T) ' 



where P e {r r > t) = e -(' r+p )*, and P B (r r > T) = e -E{F^^/W+r)+Ff, ). Similarly, 
the expression on the right-hand side of (|4.20p can be re-written as follows: 



(4.22) 

E g [e \t < T\ = J e 1 P e (r G d*i|r < T)dii = y P e ( T < T) 

J Pe(r<T) 1 7 Pe(r<T) 



i> + r . 
v 

v + r , 



(1/ + r) ; ; dt-l 

f Pe(r r edtjPgjhKT) v P e {T r <T) 

P 6 (t<T) 1 D + rP e {T<T)- 
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(4.23) 

f e-^Pe | inf BHs) > -E, B»(t) G dx, t e dt 
(xt)e-^dt-- ^- <( 



Po(t < T) 

J e'^Pg f inf B»(s) > -E,B»(t) € dx) P 9 {t g dt)dt 
Pe(r < T) 

/ e-^Pg ^inf Bf(s) > -E,B»(t) G dxj ve~ 9t dt 
Pe(r < T) 

f(D + -f)e- ( -~ l+ ^ t P e ( inf B»(s) > -E,B»(t) G ) 
jy J \0<s<t J 

' Pe{r < T) 

/ P 8 f inf ^(s) > -E,B»(t) € dx^j P e (^ G dt)dt 
' Ps(t<T) 

fPgf inf B»(s) > -E, B^U) G dx, rT G dt] dt 

v J \0<s<t I 



i> + j Pb(t<T) 

v P 8 {B^(t-<) G dx,^ <T) 
v + 1 P (t < T) ' 
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(4.24) 



oo roo 



-E JO 



f(BVi(T),T)\T<T( X > t ) e 1 e 



ft -Sx 



dtdx 



v S°° E P e {B»(T"i) g dx,Tf < T)e- &x dx 



v + 7 



Pe{r < T) 



(v + 7)P e (r < T) J_ E \0<s<tt 



P e ( inf B"(s) > -E, ^(t 7 ) G ) e^dx 



(^ + 7)P (r <T) 
J/ + 7 



PeiB^iT') e dx)e- d3! - P e inf B^fs) < -E,B^(t~<) e dx 

' 0<S<Tt 



a (F M -5)x- 1x1^/2(5+7)+^ _ -x(^2(P+ 7 )+F2+5-F tl ) -2£ % /2(P+7)+_F2 



(ix 



2(P + 7 ) + ^(r<T) 



; -x( % /2(P+7)+F2+5-^) + / e x( v /2(i>+7)+F2-5+F fl ) 



— B 



-2B v /2(P+ 7 )+F2 / x ( v /2(i/+ 7 )+F2+5-F, 1 ) 



1/(1 - e -- E (v /2 ( p+ ^ +F '+ F "- 5 )) 
(^ + 7 + <5^-<5 2 /2)Pe(r <T)' 



Where the first equality follows by (|4.23[) . 

Constructing the integral equation. The DM chooses the risky arm, until 
the minimum between the stopping time of the first jump r and the stopping time 
T. We distinguish between two cases. In case the DM stops before the time of the 
first jump t, we calculate the expected discounted payoff from the risky arm until 
the stopping time T, and the expected discounted payoff from the safe arm after 
the stopping time T. In case the first jump occurs before the stopping time T, we 
calculate the expected discounted payoff received from the continuous part Y^(t) 
until time r. We add the expected discounted payoff from the first jump, and the 
expected discounted continuation payoff, updating both the posterior pt.h,x, and 
the intercept E + Gh + x, according to the time of the first jump, the first jump's 
size, and the value of the continuous part of the payoff. 

The notation used are as follows: Pg(r > T) is the probability that the DM switches 
to the safe arm before a jump occurs. If r > T then the expected payoff from the 
risky arm is E§[J^ re~ rt dYg(t)\T > T], and the expected payoff from the safe arm 
is Eg[J^ re~ rt sdt\T > T}. Pg(r < T) is the probability a jump occurs before the 
DM switches to the safe arm. If t < T then the expected payoff until the first 
jump is Eq[Jq Te~ rt dYg(t)\r < T]. frg^/ T \ t \\ t< t( x ^) * s tne probability that the 
first jump occurs in the interval [t, t + dt), and i? p (r) belongs to [a;, x + dx), given 
that a jump occurs before the DM switches to the safe arm. re _rt e J hv(dh) is the 
expected discounted payoff from the first jump, and i J ve~ rt U(p t ji,x,E + Gh + x)) 
is the expected discounted continuation payoff, updating both the posterior pt,h,x-, 
and the intercept E + Gh + x at time t. With this notation, the expected payoff is 
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dtdx 



as follows: 
(4.25) 

U(p , E) = PQ P ei (T < T) [E 6l [/ r re- rt dY^(t)\r < T] 

+ Te Io° f(B"i (t),t)\t<t( x ' *) ( re ~ rt £ I ^( dh ) h + k I Mdh)e- rt U(p t j^ x , E + G h + xj) 
+ Po P 9i (t > T) [e 9i [J^ re- rt dY^t)\r > T] + Eg 1 [J^° re~ rt sdt\r > T] 
+ (1 -po)Pe 2 (r < T) [Ee 2 {fn re- rt dYi(t)\r < T] 

+ A Jo°° f { ^ H r),r)\T<T( x > *) ( re ' rt k I V ^ dh ) h + k I Mdh)e~ rt U(p t ^ x , E + G h + xj) dtdx 
+(l-p )Pe 2 (T > T) \e 62 [^ re~ rt dY^(t)\T > T] + E 62 [f™ re~ rt sdt\r > T] 



By Eqs. (|4~T8l) . (|4~19|) and (j4~20]l . this expression is equal to 
MiPo(l - P 0i (t > T))(l - E Bl [e- rT \ T < T]j 
+PqP 6i (t < TjH x J X E J °° f(Bfi( T ),T)\T<T( x ' t)re~ rt dtdx 

+ Po P ei (t < T) j°° E /~ f^ 1(T) T)lT<T (x, *)i / e- rt C/(p t)h)X , S+Gffc+^^CdfcJAda! 
+MiPo-P ei (r > T)(l - E Sl [e- rT \r > T]j + spo^r > T^Je^T > T] 
+/ia(l ~ Po)Pe 2 (r > T))(l - £„ 2 [e"" |r < T]) 
+(l-Po)Pe 3 (r < T)H 2 f™ E f™f e { ^ 2{T)iT)lT<T {x,t)re- rt dtdx 
+(l-J*)Pfe(r<T)/^/~/£^ 

+» 2 (l-p )Pe 2 (T > T)(l-Eg 2 [e- rT \T > T} + s(l-p )P e2 (r > T)Eg 2 [er rT \ T > T]. 

By Eqs. P~2T]) , P~2"2"l) and P~2"g)) , this expression is equal to 

°i(r r >T) \ , ~ , . T s P 9l (r->T) 



mP0 P ei (r > T) (1 - j + spoPei (T > T ) 

4*o(l - PeAr > T))H ir ■ ■ ^ff/ 
+ Ml Po(l - flfc (r > T)) (l - ^ ■ V-P^gx) ) 
/i 2 (l -Po)Pe 2 (r >T)(l- P Z%>tJ ) + s{1 - p )Pg 2 {r > Tj P ^^j 

v 2 l-Pe 2 (T r >T) 



r »2 

+(l- Po )(l-Pe 2 (T>T))H 2 r 



v 2 +r l-P e . 2 (T>T) 
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+ M2 (1 -p„)(l - Pe 2 (r > T)) (l - & • ^£ 
+ C E J OO Je^ U ( Pt , h , x ,E + G h + x)- 



>T) 
>T) 



PoP fll (T < T)f^ M ^ T ^ T<T (x, t)u\ (dh)/9i + (l- Po )Pe 2 (r < T)f { ^ 2{T) ^ T<T {x,t)v 2 {dh)/v 2 



dtdx 



= Pa 



+r T Pi+r j T ^ W ^P 2 +r T P 2 +r ) 

+p P ei {r* > T) (s - gfr - egg) + (1 - po)^ K > T) (s 



fi2r _ v-2 H2 r 
v 2 +r V2+r 



+ I-e IT I e- rt U(pt, h , x , E + G h + x)- 
p a Pe 1 (T < T)f { ^ 1{T) ^ T<T {x,t)vi{dh)/vi + {l-p Q )Pe 2 {T < T)f ( ^ 2(T)T)]T<T {x,t)v 2 (dh)/i) 2 

= Po(^ + ^) + (l-^o) + 

+ (1 - po)e -^ 2+V 2 ( ,2 + r )+F t 2) ^_^r__^ 

+ r E J OO Je- rt U(p t .,, Xl E + G h + xy 

■ PoPe 1 (r < T)tf^ 1{T)tT) \ T<T (x,t)v 1 (dh)/v 1 + (l-po)Pj a (T < T)f e { ^ 2{T)T)lT<T (x,t)v 2 (dh)/D 2 
Simplifying the last expression, the integral equation is 

Ufa E)=Ap + 5(1 - p) + Cpe- miE + D(l - p)e- m2jE 

/CO /'CO /> 
/ / e- rf J7( Pt A „ £ + G h + x)g(x, t, h)dtdx, 
-E JO J 

where A, B, C, and D are constants, and 

(4.26) g(x, t, h) = [ Po P 9l (r < T)/* M1 (t)t)|t<t (z, t)v x (dh)/P 1 

+ (l-p Q )Pe 2 (T < T)f e { ^ 2{T) ^ T<T {x,t)v 2 {dh)/9 2 . 
We show now that (IE) admits a unique solution U in the region [0, 1] x [0, 00). 
Boundary values 

First, we find the values of U(p,E) on the boundary of the region [0, 1] x [0, 00]. 
Note that U(p, 0) = s, and U(p, 00) = pg\ + (1 — p)g 2 for every p E [0,1], . 
We now argue that there is a unique solution for (IE) when p = 0. U(0, E) is a 
function of E with two boundary conditions, at E = and at E = 00. Suppose that 
U(0,E),V(0,E) solve (IE). Then W(0,E) := U{0, E)-V(0, E) satisfies W(0,E) = 
I-e Io° I e ~ r *^(°, E + G h + x)g(x, t, h)dtdx, and W(0, 0) = W(0, 00) = 0. Let E 
be a critical point, where W achieves its maximum. Assume to the contrary that 



dtdx 



dtdx. 
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W{0,E) > 0. By fL23l it follows that f™ & J °° / e - rt g{x,t,h) < 1. Therefore 



OO POO 



W(0,E)= / I e- rt W(0,E + G h + x)g{x,t,h) 



OO /-OO 



< W(0, E) e- rt g(x, t, h) < W(0, E), 

J-eJq J 

which implies that W(Q, E) < 0. Similarly, one can obtain that W(0, E) > 0, so 
that W(0, E) = 0, and the solution is unique. 

Similar arguments show that (IE) admits a unique solution on [0, oo) when p = 1. 

Since U (p, E) is uniquely determined on the boundary of the region [0, 1] x 
[0, oo], similar arguments show the uniqueness of the solution [0, 1] x [0, oo). Using 
Eqs. I|4.16p - (|4.24|) one can verify that the solution for (IE) is 
U(po,E) =pm + ^-Po)92 + {s- gi )poe-^^ a +^ E + ( S -g 2 )(l- Po )e-^-^ aE . 

ln (i^) w e get Eq. (EH), as 



By substituting E 



Hi— Ma 

desired. □ 
4.5. Information Pricing. 

Lemma 4.6. Let f a ,b{v) be the function defined in 12.8(1 . The equation fa.tiv) = 
admits a unique solution in the interval (0, oo). 



The proof is similar to the proof of Lemma 14.41 and therefore omitted. We turn 
to the proof of Theorem 12. 6\ which is analogous to the proof of Theorems 12.31 and 
I2~5l 

Proof of Theorem \2.6l Suppose that until time t, the DM chose the risky arm, 
and observed the jumps /if, ...h® (resp. h\, .../i^) from the process (X a (t)) (resp. 
(X b (t))). Let Y§(t) (resp. Yg(i)) be the Brownian motion with drift component 
of (X a (t)) (resp. (X b {t))) at time t. Note that Y&(t) ~ N(n j t, (a j )H), j £ {a,b}. 
Let qt := Pt(6±\hi, ...h n ; h\, ■■■h b n ; Yg(t); Yg(t); qo) be the posterior belief of the 
DM. The odd ratio of the posterior belief is 

(4.27) 

(yg(t)- M °t) 2 (yj(t)-^t) 2 



I — qt (yj(t)-^gt) 2 (y|w-M|«) 2 

goe c;!'B(t)/(' B ) ! -W) ! i/2(»") ! e c!i'S(t)/(<' l ) ! -(c!) 2 '/2(» l )V"i"i j] t _ v^dhfie-"** n t _ ^i(d^) 

~ (1 _ g ) e ^Y»(t)/a 2 -( A1 g) 2 t/2(a») 2 eAI |Y|(t)/ ( T 2 -(^)2 t /2( IT 6)2 e _pa t v a(^ h ay-v\t J"^ v\(dh h >) ' 

Indeed, -J — e 2 (" a ) :4t (resp. -J — r e 2 ( ctI, ) 2< ) is the probability of 
observing Y£(t) (resp. Yg(t)), given the type 9 i: and e - "**^ 1 ^— lit- ~ p^ 3 ' ( res P- 

e ~ Vit ^~ i m\ — Ot— ~ pF^ " ) i s ^ ne probability of receiving the n (resp. m) jumps that 
occurred until time t from (Xf(t)) (resp. (Xf(t))), given the type 0i. The first 
equality in (|4.27[) is the Bayesian belief updating, using the independence of the 
Levy processes Xf{t) and X b (t), and the independence of the components in the 
Levy-Ito decomposition, given the type of the risky arm, and the second equality 
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is obtained by eliminating common components. 



Suppose the DM follows a cut-off strategy k' with cut-off point p' . If qo < p', 
the DM will always choose the safe arm. If qo > p' the DM will initially choose 
the risky arm. Then the DM chooses the risky arm as long as qt > p' , which, by 
Eq. (|2.4|) . is equivalent to 



tfYg(t)/(<?«f-(.tfft/2(<,*) 2 -i>?t v\Y*(t)/{<T b f-(pXft/2{a b ) 2 -vtt 
(A 9S\ q ° x - X - 



n u _iwt > x TT 
._ v*(dh a ) {A v\(dh h ) " l-p r 



By taking the natural logarithm, and rearranging the resulting terms, we obtain 
that this inequality is equivalent to 



(4.29) 



(/''/■>-- (/'2) 2 (/<';> J -(/4) 



<7" / \ er u 

a"\2 f,,a\2 /,.f>\2 /,,b\2 



hi 



2(a a ) 2 

<10 



In 



2(a fc ) 2 
l-p' 



U{dhf\ ( v\{dh)) 



Since Y^(t) = ^ a + a a Z a (t), k £ {a, 6}, it follows that Eq. fl~2"9"]) is equivalent to 
(4.30) 



B^'i 1 (t) : = 



(ji a 1 -ji a 2 )z a (t) + (fi b 1 -t4)z b (t) 

- m 2 + M - A2) 2 



> 



2 M °(m? - M|) ~ ((Mi£ - (AS) 2 ) + 2£ 6 (# - £|) - ((Mi) 2 - (mI) 2 ) 

V(m?-^) 2 + (Ai-mI) 2 

- ('» (A) - >» (l£)) - Si- >n (SSfj) - Ei- >» 



where /if = for j E {a, 6, c}, and i <G {1, 2}. Notice that (t) is a standard 

Brownian motion with drift: 



- m2) 2 + (AS - Ml) 2 
gTOf - gg) - ((m?) 2 - m 2 ) + 2m"(Mi - m|) - ((mi) 2 - (AS) 2 ) 

V(Mi - Mi) 2 + (Mi - Ml) 2 
We construct an integral equation similar to the one in the proof of Theorem 
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Let r be the stopping time of the first jump from the process (X a (t) + X b {t)). 
Let the stopping time T be the first time t that satisfies 



.... -(-»(*) -*(*)) -E,-"(«) 

V(Af - Ai) 2 + (Ai - A) 2 



Denote G? := -. v" 2<t "W ( re sp. G b h ~ -, \^t=£A ) the contri- 

bution of a jump of size h a (resp. h a ) received from the process X a {t) (resp. X a (i)), 

_flnf T ™- N )-lnf-J^ T N ) N ) , , 

and E := \ v \ /' , the intercept of the right-hand side of Eq. KM 

V (A? — ^S) 2 +(Ai— A2) 2 

at t = 0. 

Constructing the integral equation. The DM chooses the risky arm, until 
the minimum between the stopping time of the first jump r and the stopping time 
T. We distinguish between two cases. In case the DM stops before the time of 
the first jump r, we calculate the expected discounted payoff from the process 
(X a (t) + X c (t)) until the stopping time T, and the expected discounted payoff 
from the safe arm after the stopping time T. In case the first jump occurs before 
the stopping time T, we calculate the expected discounted payoff received from 
the process (X a (t) + X c (t)) until time r. If the first jump was received from the 
process (X a (t)), we add the expected discounted payoff from the first jump, and 
the expected discounted continuation payoff, updating both the posterior ha x , 
and the intercept E + G£ + x, according to the time of the first jump, the first 
jump's size, and the value of the continuous part of the process (X a (t) + X b (t)); 
while if the first jump was received from the process (X b (t)), we add only the 

• h + x, according to the time of the first jump, the f 
size, and the value of the continuous part of the process (X a (t) + X b (t)). 
The posterior hj x is updated as follows: 



expected discounted continuation payoff, updating both the posterior p b hi , and 
the intercept E + G b h + x, according to the time of the first jump, the first jump's 



P? w x : = p (^\ T <T,t = t,B^(T) e dx,h j ) 



P( T < T,t = t,B^^ (t) e dx,h?\0i)P(6\) 



P(t <T,t = t,B» a ^ h (r) € dx,K>\ei)P{B{) +P(t < T,r = t, Bfi a <F '(r) € dx,hi\6 2 )P{6 2 



QoPe 1 (r<T)f { ^ afib[T)T)lT<T (x,t) 



v> (dh 3 



^oPeAr<T)f^ b{TMlT<T (x,t) 



•l^ + (i- gd )jV,(r<r)^ (T)iT)|T<T (*,t) 
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The integral equation is as follows: 
(4.32) 

U(p ,E)=p P ei (T<T) 



Eg 1 


[L 







T<T 


+ E 01 


f re- rt dY c (t) 


T<T 






Jo 





OO POO 



+ ./_ E ./n ^W(r),-)k<T (M)fe "V+rf 



v 1 H 1 



dtdx 



-E JQ 

OO />OG 



E JO 

OO poo 



dtdx 



E JO 









+ E 01 


[ re- rt dY c {t) 


T>T 




Jo 





+ PoP ei (r>T) B 0l / re rt dY^(t) t > T 
\_ Jo 

+E 01 / re~ rt sdt t>T 

+ (1- Po )Po 2 (t <T) E 

f 02 (x t)rc~ rt ~ 2 " 2 



dtdx 



re~ rt dY£{t) 



T < T 


+ Ee 2 


f re- rt dY c (t) 


T < T 






Jo 





OO /* OO 



r,a Tja 

~ rt 2 2 -dtdx 



-E JO 
oo />oo 



E JO 

OO /"OO 



E JO 



/ ( V ; ^ (T))r)|T<T ^,*)e- rt / ns- ,,....,.£ + a;; ,) 



v%(dh a ) 

l>2 + 4 
4(dh») 



dtdx 



dtdx 



+ (1 - Po )Po 2 (t > T) 



En 



re- rt dY£(t) 











T>T 


+ Eg 2 


[ re- rt dY c (t) 


T>T 






Jo 





+Eg, 



re~ rt sdt 



T>T 



Similarly to Eqs. (|4. 19|) and (|4.20|) . using the stochastic integral properties for Levy 
processes, it follows that for i £ {1, 2} 



and 



En 



En 



re- rt dY c (t) 



re~ rt dY c {t) 



T>T 



T <T 



gt (1 - E 9 [e-^ \r > T] , 



gf(l~E e [e'^\T<T}) 



By these equations, and Eqs. (|4. 17)1 - (|4.24[) . one can verify that for every go > p'> 
the unique solution of Eq. (|4.32|) is 



U(p , E) = (s -gf- s f) PoC -03+i)V(Af-/iS) a +aii-/iS) a « 

+ {s~92- SaX 1 " po)e~^ /(Af ~ AS)2+(A *~ Aa ' )2B 
+ Po(9i+9i) + (1 -po)(fl5 + .9 2 C )- 
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Therefore, for every qo > p', the expected discounted payoff is 

/ -i \ + 1 / , X + 1 

V pl (po, qo) = {B-9l- tftpo {-^ L ) [t^) 

+ Po(g a ! + 9t) + (1 - Po)(.9 2 a + <?!)> 

as desired. 

Now, if go = Po, using the same method to that used to prove Theorem 12.31 one 
can show that the optimal cut-off strategy is 



{f3 + l){g a 1 +gl-s)+p{ s -g a 2 -gf)- 

□ 



Proof of Corollary \2.1\ Let (3 a be the unique solution of 



in (0, oo), and let f3 a ,b be the unique solution of Eq. (|2.8[) . Using similar arguments 
to those used to prove Lemma 14.41 one can show that (3 a ,b < Pa- Let p* c — 

m , 1U «1 { \~ 9 \Z 9 }) n — be the optimal cut-off of DM2. By Theorem [231 the 

optimal expected payoff of DM2 is given by 

U D M2{po) = V^ c c {po,po) 

= f S '^P0<P*a. c , 

\po(9t +9t) + (1 -Po)(5 2 a +3 2 c ) + Cp a (l-p o )(^)0" if Po > P* a , c , 
where C Pa = "sl-^-Pl. M-gl-g^) 



By Theorem I2.6[ the expected payoff of DM1, using the same cut-off point p* a t 
is given by 



Vp,' b ' c {p ,Po) 



S if PO < P*a. c , 

poigl + gd + (i -poM +g c 2 ) + C Pa , b (i -Po)(^f y 3 - 6 if po >p:, c , 



Pa 

s-ffS -9l -Pa,c(9l -Sl-92 -92) 



where C, 



(1-Pa.c) 



Since /3 a , h < /3 a , it follows that if p > Pa,o tnen U DM 2{po) = V^(p ,Po) < 
Vp/' c (po;Po) < U DM i(po), as desired. 

□ 

4.6. Optimism vs Pessimism. 



Proof of Theorem [Ql Substituting q = p + e in Eq. (|2~7)) yields 
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V Po(t) =P09l + (1 -Po)ff2 +po{s - gi) 
+ (1 - po)(s - g 2 ) 



P 



1-p* 
1 - Po - e 
1~P*J { Po + e 
where a is the unique solution of (|2.2p in (0,oo). 



a+l / , \ a+l 

1 - Pq - e 
Po + e 



Simple algebraic manipulations yield 



V Po( e ) = Pogi + (1 -Po)g2 +Po{s - gi) 



P 



a+l / -, \ a + l 

1 - Po - e 



+(1 -Po)(s- 32) 



1 - Po - e 
Po + e 



Po + e 



[Po5i + (1 -Po)#2] 



1-p* 



Po(s-gi) 



P* ( 1 - Po - e 



a + l 



[pofl'i + (1 - £0)52] + 



1 - p* \ pa + e 

a \ I s - g 2 



+ (1 -Po)0 - 32) 



a+l/ \<?i — s 



~Po(s~ g 2 ) 



a / 1 — p — e 



a+l 



[Pogi + (1 -^0)52] 



a fl-po-e 



a+l \ po + e 

a \ I s — g 2 



+ (1 -Po)(« - 52) 



1 - Po - e 
Po + e 



1 - Po - e 
Po + e 



a + l/ \gi-sj p {s-g 2 



1 



a + 1 V Po + e 



a+l 



1 - Po / 1 - po - e 



Po 



Po + e 



For every x > define J+(x) = Vp(ai) — V p (— x). This is the difference between the 
payoff of an optimist and the payoff of a pessimist. Straightforward calculations 
show that 



W'(x) = — 
P 



1 - (p + x) 



1 



l-(p-x) 



p + x J (p + x) 2 (l — (p + x)) \ p — x J (p - x) 2 (l - (p — x)) 



so that W(0) = l+'(0) = 0. Suppose a > 1. Since ^j^p < ( p _ a )a . and 

( 1 ~p ( +t a:) ) a 1 < ( ^p-T )" > we § et w '( x ) > for ever y x > such that 

p* < p ± a; < 1, and so W(x) > 0; an optimist will fare better. 

If < a < 1, it is easy to verify that p* < 9L ^-- Since the function y ,J< (i-y) 
decreases for < y < ^-^p and increases for < y < 1, it follows that for every 
x > such that p* < p ± x < we get W'(x) > 0; an optimist will fare better, 
and for every x > such that SL ^- < p ± x we get T+'(a;) < 0, and so W(x) < 0; a 
pessimist will fare better. □ 
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5. Figures 

In Figure 1 we depict a generic path of the Levy payoff process (Y(t)). The 
contribution of the compound Poisson process (the jumps) appears in Figure 2. 
The posterior belief given the observations appears in Figure 3. Finally, Figure 4 
shows the time-dependent cut-off, as well as the continuous part of the process (the 
Brownian motion with drift). 



1 40 r 




0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 

time (t) 



Figure 1 . The payoff process 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 

time (t) 



0.9 1.0 



Figure 2 . The Poisson arrivals 
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Figure 3. The posterior process 




0.5 0.6 
time (t) 



Figure 4. The strategy description 
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